Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 32
Filter
Add more filters










Publication year range
1.
Genome Biol Evol ; 16(4)2024 Apr 02.
Article in English | MEDLINE | ID: mdl-38518756

ABSTRACT

Ancestral reconstruction is a widely used technique that has been applied to understand the evolutionary history of gain and loss of gene families. Ancestral gene content can be reconstructed via different phylogenetic methods, but many current and previous studies employ Dollo parsimony. We hypothesize that Dollo parsimony is not appropriate for ancestral gene content reconstruction inferences based on sequence homology, as Dollo parsimony is derived from the assumption that a complex character cannot be regained. This premise does not accurately model molecular sequence evolution, in which false orthology can result from sequence convergence or lateral gene transfer. The aim of this study is to test Dollo parsimony's suitability for ancestral gene content reconstruction and to compare its inferences with a maximum likelihood-based approach that allows a gene family to be gained more than once within a tree. We first compared the performance of the two approaches on a series of artificial data sets each of 5,000 genes that were simulated according to a spectrum of evolutionary rates without gene gain or loss, so that inferred deviations from the true gene count would arise only from errors in orthology inference and ancestral reconstruction. Next, we reconstructed protein domain evolution on a phylogeny representing known eukaryotic diversity. We observed that Dollo parsimony produced numerous ancestral gene content overestimations, especially at nodes closer to the root of the tree. These observations led us to the conclusion that, confirming our hypothesis, Dollo parsimony is not an appropriate method for ancestral reconstruction studies based on sequence homology.


Subject(s)
Evolution, Molecular , Phylogeny , Likelihood Functions
2.
Proc Natl Acad Sci U S A ; 119(35): e2206610119, 2022 08 30.
Article in English | MEDLINE | ID: mdl-35947637

ABSTRACT

The coronavirus disease 19 (COVID-19) pandemic is caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), a coronavirus that spilled over from the bat reservoir. Despite numerous clinical trials and vaccines, the burden remains immense, and the host determinants of SARS-CoV-2 susceptibility and COVID-19 severity remain largely unknown. Signatures of positive selection detected by comparative functional genetic analyses in primate and bat genomes can uncover important and specific adaptations that occurred at virus-host interfaces. We performed high-throughput evolutionary analyses of 334 SARS-CoV-2-interacting proteins to identify SARS-CoV adaptive loci and uncover functional differences between modern humans, primates, and bats. Using DGINN (Detection of Genetic INNovation), we identified 38 bat and 81 primate proteins with marks of positive selection. Seventeen genes, including the ACE2 receptor, present adaptive marks in both mammalian orders, suggesting common virus-host interfaces and past epidemics of coronaviruses shaping their genomes. Yet, 84 genes presented distinct adaptations in bats and primates. Notably, residues involved in ubiquitination and phosphorylation of the inflammatory RIPK1 have rapidly evolved in bats but not primates, suggesting different inflammation regulation versus humans. Furthermore, we discovered residues with typical virus-host arms race marks in primates, such as in the entry factor TMPRSS2 or the autophagy adaptor FYCO1, pointing to host-specific in vivo interfaces that may be drug targets. Finally, we found that FYCO1 sites under adaptation in primates are those associated with severe COVID-19, supporting their importance in pathogenesis and replication. Overall, we identified adaptations involved in SARS-CoV-2 infection in bats and primates, enlightening modern genetic determinants of virus susceptibility and severity.


Subject(s)
COVID-19 , Chiroptera , Evolution, Molecular , Host Adaptation , Primates , SARS-CoV-2 , Spike Glycoprotein, Coronavirus , Animals , COVID-19/genetics , Chiroptera/virology , Genetic Predisposition to Disease , Host Adaptation/genetics , Humans , Pandemics , Primates/genetics , Primates/virology , SARS-CoV-2/genetics , Selection, Genetic , Spike Glycoprotein, Coronavirus/genetics
3.
Curr Biol ; 32(10): 2325-2333.e6, 2022 05 23.
Article in English | MEDLINE | ID: mdl-35483362

ABSTRACT

Cytoplasmic male sterility (CMS) is a form of genetic conflict over sex determination that results from differences in modes of inheritance between genomic compartments.1-3 Indeed, maternally transmitted (usually mitochondrial) genes sometimes enhance their transmission by suppressing the male function in a hermaphroditic organism to the detriment of biparentally inherited nuclear genes. Therefore, these hermaphrodites become functionally female and may coexist with regular hermaphrodites in so-called gynodioecious populations.3 CMS has been known in plants since Darwin's times4 but is previously unknown in the animal kingdom.5-8 We relate the first observation of CMS in animals. It occurs in a freshwater snail population, where some individuals appear unable to sire offspring in controlled crosses and show anatomical, physiological, and behavioral characters consistent with a suppression of the male function. Male sterility is associated with a mitochondrial lineage that underwent a spectacular acceleration of DNA substitution rates, affecting the entire mitochondrial genome-this acceleration concerns both synonymous and non-synonymous substitutions and therefore results from increased mitogenome mutation rates. Consequently, mitochondrial haplotype divergence within the population is exceptionally high, matching that observed between snail taxa that diverged 475 million years ago. This result is reminiscent of similar accelerations in mitogenome evolution observed in plant clades where gynodioecy is frequent,9,10 both being consistent with arms-race evolution of genome regions implicated in CMS.11,12 Our study shows that genomic conflicts can trigger independent evolution of similar sex-determination systems in plants and animals and dramatically accelerate molecular evolution.


Subject(s)
DNA, Mitochondrial , Genome, Mitochondrial , Animals , DNA, Mitochondrial/genetics , Evolution, Molecular , Female , Haplotypes , Mitochondria/genetics
4.
Syst Biol ; 70(3): 608-622, 2021 04 15.
Article in English | MEDLINE | ID: mdl-33252676

ABSTRACT

Detecting the signature of selection in coding sequences and associating it with shifts in phenotypic states can unveil genes underlying complex traits. Of the various signatures of selection exhibited at the molecular level, changes in the pattern of selection at protein-coding genes have been of main interest. To this end, phylogenetic branch-site codon models are routinely applied to detect changes in selective patterns along specific branches of the phylogeny. Many of these methods rely on a prespecified partition of the phylogeny to branch categories, thus treating the course of trait evolution as fully resolved and assuming that phenotypic transitions have occurred only at speciation events. Here, we present TraitRELAX, a new phylogenetic model that alleviates these strong assumptions by explicitly accounting for the uncertainty in the evolution of both trait and coding sequences. This joint statistical framework enables the detection of changes in selection intensity upon repeated trait transitions. We evaluated the performance of TraitRELAX using simulations and then applied it to two case studies. Using TraitRELAX, we found an intensification of selection in the primate SEMG2 gene in polygynandrous species compared to species of other mating forms, as well as changes in the intensity of purifying selection operating on sixteen bacterial genes upon transitioning from a free-living to an endosymbiotic lifestyle.[Evolutionary selection; intensification; $\gamma $-proteobacteria; genotype-phenotype; relaxation; SEMG2.].


Subject(s)
Evolution, Molecular , Phenotype , Selection, Genetic , Animals , Codon , Models, Genetic , Phylogeny , Primates/genetics
5.
Bioinformatics ; 36(18): 4822-4824, 2020 09 15.
Article in English | MEDLINE | ID: mdl-33085745

ABSTRACT

MOTIVATION: Gene and species tree reconciliation methods are used to interpret gene trees, root them and correct uncertainties that are due to scarcity of signal in multiple sequence alignments. So far, reconciliation tools have not been integrated in standard phylogenetic software and they either lack performance on certain functions, or usability for biologists. RESULTS: We present Treerecs, a phylogenetic software based on duplication-loss reconciliation. Treerecs is simple to install and to use. It is fast and versatile, has a graphic output, and can be used along with methods for phylogenetic inference on multiple alignments like PLL and Seaview. AVAILABILITY AND IMPLEMENTATION: Treerecs is open-source. Its source code (C++, AGPLv3) and manuals are available from https://project.inria.fr/treerecs/.


Subject(s)
Algorithms , Evolution, Molecular , Phylogeny , Sequence Alignment , Software
6.
Nucleic Acids Res ; 48(18): e103, 2020 10 09.
Article in English | MEDLINE | ID: mdl-32941639

ABSTRACT

Adaptive evolution has shaped major biological processes. Finding the protein-coding genes and the sites that have been subjected to adaptation during evolutionary time is a major endeavor. However, very few methods fully automate the identification of positively selected genes, and widespread sources of genetic innovations such as gene duplication and recombination are absent from most pipelines. Here, we developed DGINN, a highly-flexible and public pipeline to Detect Genetic INNovations and adaptive evolution in protein-coding genes. DGINN automates, from a gene's sequence, all steps of the evolutionary analyses necessary to detect the aforementioned innovations, including the search for homologs in databases, assignation of orthology groups, identification of duplication and recombination events, as well as detection of positive selection using five methods to increase precision and ranking of genes when a large panel is analyzed. DGINN was validated on nineteen genes with previously-characterized evolutionary histories in primates, including some engaged in host-pathogen arms-races. Our results confirm and also expand results from the literature, including novel findings on the Guanylate-binding protein family, GBPs. This establishes DGINN as an efficient tool to automatically detect genetic innovations and adaptive evolution in diverse datasets, from the user's gene of interest to a large gene list in any species range.


Subject(s)
Databases, Genetic , Primates/genetics , Proteins/genetics , Animals , Evolution, Molecular , Genetic Variation , Selection, Genetic
7.
Philos Trans R Soc Lond B Biol Sci ; 374(1777): 20180234, 2019 07 22.
Article in English | MEDLINE | ID: mdl-31154974

ABSTRACT

In evolutionary genomics, researchers have taken an interest in identifying substitutions that subtend convergent phenotypic adaptations. This is a difficult question that requires distinguishing foreground convergent substitutions that are involved in the convergent phenotype from background convergent substitutions. Those may be linked to other adaptations, may be neutral or may be the consequence of mutational biases. Furthermore, there is no generally accepted definition of convergent substitutions. Various methods that use different definitions have been proposed in the literature, resulting in different sets of candidate foreground convergent substitutions. In this article, we first describe the processes that can generate foreground convergent substitutions in coding sequences, separating adaptive from non-adaptive processes. Second, we review methods that have been proposed to detect foreground convergent substitutions in coding sequences and expose the assumptions that underlie them. Finally, we examine their power on simulations of convergent changes-including in the presence of a change in the efficacy of selection-and on empirical alignments. This article is part of the theme issue 'Convergent evolution in the genomics era: new insights and directions'.


Subject(s)
Amino Acids/genetics , Evolution, Molecular , Proteins/genetics , Amino Acids/metabolism , Animals , Genomics , Humans , Models, Genetic , Phylogeny , Proteins/metabolism
8.
Genome Biol ; 20(1): 5, 2019 01 07.
Article in English | MEDLINE | ID: mdl-30616647

ABSTRACT

BACKGROUND: The nearly neutral theory of molecular evolution predicts that the efficacy of natural selection increases with the effective population size. This prediction has been verified by independent observations in diverse taxa, which show that life-history traits are strongly correlated with measures of the efficacy of selection, such as the dN/dS ratio. Surprisingly, avian taxa are an exception to this theory because correlations between life-history traits and dN/dS are apparently absent. Here we explore the role of GC-biased gene conversion on estimates of substitution rates as a potential driver of these unexpected observations. RESULTS: We analyze the relationship between dN/dS estimated from alignments of 47 avian genomes and several proxies for effective population size. To distinguish the impact of GC-biased gene conversion from selection, we use an approach that accounts for non-stationary base composition and estimate dN/dS separately for changes affected or unaffected by GC-biased gene conversion. This analysis shows that the impact of GC-biased gene conversion on substitution rates can explain the lack of correlations between life-history traits and dN/dS. Strong correlations between life-history traits and dN/dS are recovered after accounting for GC-biased gene conversion. The correlations are robust to variation in base composition and genomic location. CONCLUSIONS: Our study shows that gene sequence evolution across a wide range of avian lineages meets the prediction of the nearly neutral theory, the efficacy of selection increases with effective population size. Moreover, our study illustrates that accounting for GC-biased gene conversion is important to correctly estimate the strength of selection.


Subject(s)
Birds/genetics , Gene Conversion , Genetic Drift , Selection, Genetic , Animals , Base Composition , Birds/growth & development , Chromosomes
10.
Mol Biol Evol ; 35(12): 2900-2912, 2018 12 01.
Article in English | MEDLINE | ID: mdl-30247705

ABSTRACT

The rate of molecular evolution varies widely among species. Life history traits (LHTs) have been proposed as a major driver of these variations. However, the relative contribution of each trait is poorly understood. Here, we test the influence of metabolic rate (MR), longevity, and generation time (GT) on the nuclear and mitochondrial synonymous substitution rates using a group of isopod species that have made multiple independent transitions to subterranean environments. Subterranean species have repeatedly evolved a lower MR, a longer lifespan and a longer GT. We assembled the nuclear transcriptomes and the mitochondrial genomes of 13 pairs of closely related isopods, each pair composed of one surface and one subterranean species. We found that subterranean species have a lower rate of nuclear synonymous substitution than surface species whereas the mitochondrial rate remained unchanged. We propose that this decoupling between nuclear and mitochondrial rates comes from different DNA replication processes in these two compartments. In isopods, the nuclear rate is probably tightly controlled by GT alone. In contrast, mitochondrial genomes appear to replicate and mutate at a rate independent of LHTs. These results are incongruent with previous studies, which were mostly devoted to vertebrates. We suggest that this incongruence can be explained by developmental differences between animal clades, with a quiescent period during female gametogenesis in mammals and birds which imposes a nuclear and mitochondrial rate coupling, as opposed to the continuous gametogenesis observed in most arthropods.


Subject(s)
Evolution, Molecular , Genome, Mitochondrial , Isopoda/genetics , Life History Traits , Animals , DNA Replication , Ecosystem , Electron Transport , Isopoda/metabolism , Isopoda/radiation effects , Protein Biosynthesis , Selection, Genetic
11.
Mol Biol Evol ; 35(9): 2296-2306, 2018 09 01.
Article in English | MEDLINE | ID: mdl-29986048

ABSTRACT

In the history of life, some phenotypes have been acquired several times independently, through convergent evolution. Recently, lots of genome-scale studies have been devoted to identify nucleotides or amino acids that changed in a convergent manner when the convergent phenotypes evolved. These efforts have had mixed results, probably because of differences in the detection methods, and because of conceptual differences about the definition of a convergent substitution. Some methods contend that substitutions are convergent only if they occur on all branches where the phenotype changed toward the exact same state at a given nucleotide or amino acid position. Others are much looser in their requirements and define a convergent substitution as one that leads the site at which they occur to prefer a phylogeny in which species with the convergent phenotype group together. Here, we suggest to look for convergent shifts in amino acid preferences instead of convergent substitutions to the exact same amino acid. We define as convergent shifts substitutions that occur on all branches where the phenotype changed and such that they correspond to a change in the type of amino acid preferred at this position. We implement the corresponding model into a method named PCOC. We show on simulations that PCOC better recovers convergent shifts than existing methods in terms of sensitivity and specificity. We test it on a plant protein alignment where convergent evolution has been studied in detail and find that our method recovers several previously identified convergent substitutions and proposes credible new candidates.


Subject(s)
Amino Acid Substitution , Evolution, Molecular , Genetic Techniques , Models, Genetic , Animals , Cyperaceae/genetics , Mammals/genetics
12.
Mol Biol Evol ; 35(3): 734-742, 2018 Mar 01.
Article in English | MEDLINE | ID: mdl-29220511

ABSTRACT

The measurement of synonymous and nonsynonymous substitution rates (dS and dN) is useful for assessing selection operating on protein sequences or for investigating mutational processes affecting genomes. In particular, the ratio dNdS is expected to be a good proxy for ω, the ratio of fixation probabilities of nonsynonymous mutations relative to that of neutral mutations. Standard methods for estimating dN, dS, or ω rely on the assumption that the base composition of sequences is at the equilibrium of the evolutionary process. In many clades, this assumption of stationarity is in fact incorrect, and we show here through simulations and analyses of empirical data that nonstationarity biases the estimate of dN, dS, and ω. We show that the bias in the estimate of ω can be fixed by explicitly taking into consideration nonstationarity in the modeling of codon evolution, in a maximum likelihood framework. Moreover, we propose an exact method for estimating dN and dS on branches, based on stochastic mapping, that can take into account nonstationarity. This method can be directly applied to any kind of codon evolution model, as long as neutrality is clearly parameterized.

13.
Genome Biol Evol ; 9(12): 3413-3431, 2017 12 01.
Article in English | MEDLINE | ID: mdl-29220487

ABSTRACT

Horizontal gene transfer (HGT) is considered as a major source of innovation in bacteria, and as such is expected to drive adaptation to new ecological niches. However, among the many genes acquired through HGT along the diversification history of genomes, only a fraction may have actively contributed to sustained ecological adaptation. We used a phylogenetic approach accounting for the transfer of genes (or groups of genes) to estimate the history of genomes in Agrobacterium biovar 1, a diverse group of soil and plant-dwelling bacterial species. We identified clade-specific blocks of cotransferred genes encoding coherent biochemical pathways that may have contributed to the evolutionary success of key Agrobacterium clades. This pattern of gene coevolution rejects a neutral model of transfer, in which neighboring genes would be transferred independently of their function and rather suggests purifying selection on collectively coded acquired pathways. The acquisition of these synapomorphic blocks of cofunctioning genes probably drove the ecological diversification of Agrobacterium and defined features of ancestral ecological niches, which consistently hint at a strong selective role of host plant rhizospheres.


Subject(s)
Agrobacterium/cytology , Agrobacterium/genetics , Biological Evolution , Ecology , Genetic Variation , Genome, Bacterial , Computational Biology , High-Throughput Nucleotide Sequencing , Phylogeny , Software
14.
Genome Biol Evol ; 9(10): 2506-2509, 2017 10 01.
Article in English | MEDLINE | ID: mdl-28981643

ABSTRACT

Given the importance of meiotic recombination in biology, there is a need to develop robust methods to estimate meiotic recombination rates. A popular approach, called the Marey map approach, relies on comparing genetic and physical maps of a chromosome to estimate local recombination rates. In the past, we have implemented this approach in an R package called MareyMap, which includes many functionalities useful to get reliable recombination rate estimates in a semi-automated way. MareyMap has been used repeatedly in studies looking at the effect of recombination on genome evolution. Here, we propose a simpler user-friendly web service version of MareyMap, called MareyMap Online, which allows a user to get recombination rates from her/his own data or from a publicly available database that we offer in a few clicks. When the analysis is done, the user is asked whether her/his curated data can be placed in the database and shared with other users, which we hope will make meta-analysis on recombination rates including many species easy in the future.


Subject(s)
Recombination, Genetic , Software , Animals , Databases, Genetic , Humans , Internet , Meiosis , Plants/genetics
15.
Genome Res ; 27(6): 1016-1028, 2017 06.
Article in English | MEDLINE | ID: mdl-28424354

ABSTRACT

The evolutionary origin of the striking genome size variations found in eukaryotes remains enigmatic. The effective size of populations, by controlling selection efficacy, is expected to be a key parameter underlying genome size evolution. However, this hypothesis has proved difficult to investigate using empirical data sets. Here, we tested this hypothesis using 22 de novo transcriptomes and low-coverage genomes of asellid isopods, which represent 11 independent habitat shifts from surface water to resource-poor groundwater. We show that these habitat shifts are associated with higher transcriptome-wide [Formula: see text] After ruling out the role of positive selection and pseudogenization, we show that these transcriptome-wide [Formula: see text] increases are the consequence of a reduction in selection efficacy imposed by the smaller effective population size of subterranean species. This reduction is paralleled by an important increase in genome size (25% increase on average), an increase also confirmed in subterranean decapods and mollusks. We also control for an adaptive impact of genome size on life history traits but find no correlation between body size, or growth rate, and genome size. We show instead that the independent increases in genome size measured in subterranean isopods are the direct consequence of increasing invasion rates by repeat elements, which are less efficiently purged out by purifying selection. Contrary to selection efficacy, polymorphism is not correlated to genome size. We propose that recent demographic fluctuations and the difficulty of observing polymorphism variation in polymorphism-poor species can obfuscate the link between effective population size and genome size when polymorphism data are used alone.


Subject(s)
Genetic Speciation , Genome Size , Isopoda/genetics , Phylogeny , Selection, Genetic , Animals , Decapoda/classification , Decapoda/genetics , High-Throughput Nucleotide Sequencing , Isopoda/classification , Microsatellite Repeats , Mollusca/classification , Mollusca/genetics , Polymorphism, Genetic , Transcriptome
16.
Genome Biol ; 18(1): 29, 2017 02 15.
Article in English | MEDLINE | ID: mdl-28202034

ABSTRACT

BACKGROUND: Comparative transcriptomics can answer many questions in developmental and evolutionary developmental biology. Most transcriptomic studies start by showing global patterns of variation in transcriptomes that differ between species or organs through developmental time. However, little is known about the kinds of expression differences that shape these patterns. RESULTS: We compared transcriptomes during the development of two morphologically distinct serial organs, the upper and lower first molars of the mouse. We found that these two types of teeth largely share the same gene expression dynamics but that three major transcriptomic signatures distinguish them, all of which are shaped by differences in the relative abundance of different cell types. First, lower/upper molar differences are maintained throughout morphogenesis and stem from differences in the relative abundance of mesenchyme and from constant differences in gene expression within tissues. Second, there are clear time-shift differences in the transcriptomes of the two molars related to cusp tissue abundance. Third, the transcriptomes differ most during early-mid crown morphogenesis, corresponding to exaggerated morphogenetic processes in the upper molar involving fewer mitotic cells but more migrating cells. From these findings, we formulate hypotheses about the mechanisms enabling the two molars to reach different phenotypes. We also successfully applied our approach to forelimb and hindlimb development. CONCLUSIONS: Gene expression in a complex tissue reflects not only transcriptional regulation but also abundance of different cell types. This knowledge provides valuable insights into the cellular processes underpinning differences in organ development. Our approach should be applicable to most comparative developmental contexts.


Subject(s)
Developmental Biology , Gene Expression Regulation, Developmental , Transcriptome , Animals , Developmental Biology/methods , Epithelium/embryology , Epithelium/metabolism , Female , Male , Mesoderm/embryology , Mesoderm/metabolism , Mice , Molar/embryology , Molar/metabolism , Morphogenesis/genetics , Mosaicism , Organogenesis/genetics , Signal Transduction
17.
PLoS One ; 11(8): e0159559, 2016.
Article in English | MEDLINE | ID: mdl-27513924

ABSTRACT

MOTIVATIONS: Gene trees inferred solely from multiple alignments of homologous sequences often contain weakly supported and uncertain branches. Information for their full resolution may lie in the dependency between gene families and their genomic context. Integrative methods, using species tree information in addition to sequence information, often rely on a computationally intensive tree space search which forecloses an application to large genomic databases. RESULTS: We propose a new method, called ProfileNJ, that takes a gene tree with statistical supports on its branches, and corrects its weakly supported parts by using a combination of information from a species tree and a distance matrix. Its low running time enabled us to use it on the whole Ensembl Compara database, for which we propose an alternative, arguably more plausible set of gene trees. This allowed us to perform a genome-wide analysis of duplication and loss patterns on the history of 63 eukaryote species, and predict ancestral gene content and order for all ancestors along the phylogeny. AVAILABILITY: A web interface called RefineTree, including ProfileNJ as well as a other gene tree correction methods, which we also test on the Ensembl gene families, is available at: http://www-ens.iro.umontreal.ca/~adbit/polytomysolver.html. The code of ProfileNJ as well as the set of gene trees corrected by ProfileNJ from Ensembl Compara version 73 families are also made available.


Subject(s)
Algorithms , Computational Biology/methods , Evolution, Molecular , Genes/genetics , Genome/genetics , Phylogeny , Animals , Humans , Sequence Analysis, DNA
18.
Genome Biol Evol ; 8(8): 2427-41, 2016 08 25.
Article in English | MEDLINE | ID: mdl-27401173

ABSTRACT

Gene sequences are the target of evolution operating at different levels, including the nucleotide, codon, and amino acid levels. Disentangling the impact of those different levels on gene sequences requires developing a probabilistic model with three layers. Here we present SENCA (site evolution of nucleotides, codons, and amino acids), a codon substitution model that separately describes 1) nucleotide processes which apply on all sites of a sequence such as the mutational bias, 2) preferences between synonymous codons, and 3) preferences among amino acids. We argue that most synonymous substitutions are not neutral and that SENCA provides more accurate estimates of selection compared with more classical codon sequence models. We study the forces that drive the genomic content evolution, intraspecifically in the core genome of 21 prokaryotes and interspecifically for five Enterobacteria. We retrieve the existence of a universal mutational bias toward AT, and that taking into account selection on synonymous codon usage has consequences on the measurement of selection on nonsynonymous substitutions. We also confirm that codon usage bias is mostly driven by selection on preferred codons. We propose new summary statistics to measure the relative importance of the different evolutionary processes acting on sequences.


Subject(s)
Amino Acid Sequence/genetics , Codon/genetics , Evolution, Molecular , Selection, Genetic , Amino Acid Substitution , Humans , Models, Genetic , Models, Statistical , Mutation , Nucleotides/genetics
19.
Genome Biol Evol ; 8(5): 1427-39, 2016 05 22.
Article in English | MEDLINE | ID: mdl-27190002

ABSTRACT

Models of evolution by genome rearrangements are prone to two types of flaws: One is to ignore the diversity of susceptibility to breakage across genomic regions, and the other is to suppose that susceptibility values are given. Without necessarily supposing their precise localization, we call "solid" the regions that are improbably broken by rearrangements and "fragile" the regions outside solid ones. We propose a model of evolution by inversions where breakage probabilities vary across fragile regions and over time. It contains as a particular case the uniform breakage model on the nucleotidic sequence, where breakage probabilities are proportional to fragile region lengths. This is very different from the frequently used pseudouniform model where all fragile regions have the same probability to break. Estimations of rearrangement distances based on the pseudouniform model completely fail on simulations with the truly uniform model. On pairs of amniote genomes, we show that identifying coding genes with solid regions yields incoherent distance estimations, especially with the pseudouniform model, and to a lesser extent with the truly uniform model. This incoherence is solved when we coestimate the number of fragile regions with the rearrangement distance. The estimated number of fragile regions is surprisingly small, suggesting that a minority of regions are recurrently used by rearrangements. Estimations for several pairs of genomes at different divergence times are in agreement with a slowly evolvable colocalization of active genomic regions in the cell.


Subject(s)
Evolution, Molecular , Genome, Human , Genomics , Chromosome Inversion/genetics , Gene Rearrangement , Genetic Variation , Humans , Models, Genetic
20.
BMC Bioinformatics ; 16 Suppl 14: S7, 2015.
Article in English | MEDLINE | ID: mdl-26451469

ABSTRACT

We study statistical estimators of the number of genomic events separating two genomes under a Double Cut-and Join (DCJ) rearrangement model, by a method of moment estimation. We first propose an exact, closed, analytically invertible formula for the expected number of breakpoints after a given number of DCJs. This improves over the heuristic, recursive and computationally slower previously proposed one. Then we explore the analogies of genome evolution by DCJ with evolution of binary sequences under substitutions, permutations under transpositions, and random graphs. Each of these are presented in the literature with intuitive justifications, and are used to import results from better known fields. We formalize the relations by proving a correspondence between moments in sequence and genome evolution, provided substitutions appear four by four in the corresponding model. Eventually we prove a bounded error on two estimators of the number of cycles in the breakpoint graph after a given number of rearrangements, by an analogy with cycles in permutations and components in random graphs.


Subject(s)
Algorithms , Evolution, Molecular , Gene Rearrangement , Genome , Genomics/methods , Models, Genetic , Computer Simulation , Humans
SELECTION OF CITATIONS
SEARCH DETAIL
...